| x | y | coint_pvalue | half_life | zero_cross | pct_outside_2sd | score |
|---|---|---|---|---|---|---|
| FELE | SPXC | 0.0100000 | 22.13237 | 0.0806324 | 0.0663507 | 0.9268802 |
| WSFS | ABCB | 0.0100000 | 18.60860 | 0.0735178 | 0.0663507 | 0.9217349 |
| FIBK | ONB | 0.0100000 | 21.60553 | 0.0798419 | 0.0545024 | 0.9182047 |
| TCBI | ABCB | 0.0100000 | 24.40199 | 0.0814229 | 0.0655608 | 0.9173677 |
| WSFS | ASB | 0.0100000 | 23.98015 | 0.0814229 | 0.0568720 | 0.9064063 |
| UCB | VLY | 0.0100000 | 19.54379 | 0.0640316 | 0.0631912 | 0.9049587 |
| WSFS | TCBI | 0.0100000 | 18.79628 | 0.0687747 | 0.0552923 | 0.8960633 |
| WSFS | VLY | 0.0100000 | 23.79069 | 0.0743083 | 0.0537125 | 0.8863476 |
| WSFS | HWC | 0.0100000 | 19.16024 | 0.0545455 | 0.0655608 | 0.8857265 |
| AUB | VLY | 0.0100000 | 24.43359 | 0.0861660 | 0.0339652 | 0.8837480 |
| RNST | TCBI | 0.0100000 | 14.75160 | 0.0845850 | 0.0323855 | 0.8673205 |
| HRI | NPO | 0.0100000 | 20.86016 | 0.0632411 | 0.0497630 | 0.8602992 |
| FELE | ESE | 0.0100000 | 23.91527 | 0.0798419 | 0.0236967 | 0.8587033 |
| IBOC | ONB | 0.0100000 | 14.34639 | 0.0830040 | 0.0268562 | 0.8519237 |
| RNST | ABCB | 0.0100000 | 25.05281 | 0.0608696 | 0.0458136 | 0.8465640 |
| NWE | POR | 0.0100000 | 18.68287 | 0.0474308 | 0.0466035 | 0.8412922 |
| FULT | HWC | 0.0100000 | 20.75629 | 0.0569170 | 0.0229068 | 0.8381763 |
| RDN | ESNT | 0.0100000 | 23.62156 | 0.0545455 | 0.0426540 | 0.8371403 |
| CATY | HWC | 0.0100000 | 19.69836 | 0.0490119 | 0.0308057 | 0.8365984 |
| HP | RIG | 0.0100000 | 23.56871 | 0.0474308 | 0.0497630 | 0.8339788 |
| FIBK | IBOC | 0.0100000 | 25.41529 | 0.0577075 | 0.0473934 | 0.8331478 |
| IBOC | AUB | 0.0100000 | 26.22029 | 0.0513834 | 0.0639810 | 0.8325338 |
| WSFS | UMBF | 0.0100000 | 25.87500 | 0.0513834 | 0.0576619 | 0.8296991 |
| SLG | MAC | 0.0121665 | 25.49511 | 0.0561265 | 0.0758294 | 0.8294071 |
| BKH | POR | 0.0100000 | 24.61036 | 0.0577075 | 0.0379147 | 0.8236308 |
| ASB | HWC | 0.0100000 | 28.48705 | 0.0545455 | 0.0505529 | 0.8219903 |
| RNST | WSFS | 0.0100000 | 26.04240 | 0.0577075 | 0.0363349 | 0.8199065 |
| UCB | AUB | 0.0100000 | 22.77765 | 0.0498024 | 0.0252765 | 0.8141762 |
| WHD | AROC | 0.0100000 | 12.88370 | 0.0788043 | 0.0217096 | 0.8102753 |
| HWC | VLY | 0.0100000 | 26.72348 | 0.0442688 | 0.0323855 | 0.7816520 |
| ACA | ESE | 0.0100000 | 13.44216 | 0.0619308 | 0.0236364 | 0.7810495 |
| HP | MUR | 0.0100050 | 36.30073 | 0.0513834 | 0.0576619 | 0.7785563 |
| FIBK | PIPR | 0.0100000 | 29.17451 | 0.0521739 | 0.0331754 | 0.7747629 |
| CATY | WSFS | 0.0100000 | 27.39795 | 0.0458498 | 0.0252765 | 0.7741406 |
| MHO | IBP | 0.0100000 | 33.96137 | 0.0608696 | 0.0308057 | 0.7669593 |
| WHD | LBRT | 0.0155545 | 19.78825 | 0.0692935 | 0.0257802 | 0.7656472 |
| FIBK | AX | 0.0100000 | 32.59884 | 0.0474308 | 0.0513428 | 0.7640921 |
| WHD | MUR | 0.0190183 | 20.31691 | 0.0638587 | 0.0502035 | 0.7622161 |
| CDP | EPRT | 0.0182614 | 20.65499 | 0.0947205 | 0.0465116 | 0.7586902 |
| IBOC | UCB | 0.0100000 | 25.92547 | 0.0332016 | 0.0331754 | 0.7574958 |
| VLY | UMBF | 0.0122166 | 31.77519 | 0.0498024 | 0.0560821 | 0.7564009 |
| NHI | RHP | 0.0100000 | 28.72155 | 0.0490119 | 0.0410742 | 0.7541022 |
| IBOC | INDB | 0.0100000 | 28.79464 | 0.0387352 | 0.0308057 | 0.7472181 |
| RNST | ASB | 0.0142920 | 29.73577 | 0.0553360 | 0.0410742 | 0.7395061 |
| OUT | SBRA | 0.0100000 | 29.34784 | 0.0513834 | 0.0268562 | 0.7365596 |
| NWE | BKH | 0.0176518 | 28.37077 | 0.0616601 | 0.0568720 | 0.7351792 |
| CDP | SBRA | 0.0100000 | 37.09768 | 0.0632411 | 0.0458136 | 0.7344280 |
| ABCB | UMBF | 0.0100000 | 37.02480 | 0.0403162 | 0.0442338 | 0.7302945 |
| FELE | WTS | 0.0179630 | 25.40347 | 0.0577075 | 0.0442338 | 0.7290248 |
| HP | WHD | 0.0100000 | 34.75352 | 0.0421196 | 0.0461330 | 0.7280741 |
| IBOC | AX | 0.0100000 | 31.90460 | 0.0363636 | 0.0363349 | 0.7223014 |
| ABCB | VLY | 0.0100000 | 31.28010 | 0.0324111 | 0.0221169 | 0.7199872 |
| MUR | RIG | 0.0173514 | 33.94921 | 0.0671937 | 0.0568720 | 0.7150243 |
| SR | NJR | 0.0109400 | 42.63456 | 0.0505929 | 0.0521327 | 0.7070444 |
| ABCB | HWC | 0.0253757 | 29.73378 | 0.0695652 | 0.0655608 | 0.7007080 |
| UCB | UMBF | 0.0100000 | 41.62628 | 0.0411067 | 0.0292259 | 0.6989972 |
| NHI | CTRE | 0.0100000 | 36.19026 | 0.0347826 | 0.0442338 | 0.6950458 |
| RNST | HWC | 0.0130178 | 37.97848 | 0.0569170 | 0.0252765 | 0.6937561 |
| IBOC | VLY | 0.0100000 | 48.24171 | 0.0387352 | 0.0568720 | 0.6905040 |
| TMHC | IBP | 0.0126774 | 39.01908 | 0.0592885 | 0.0363349 | 0.6891775 |
| TCBI | UMBF | 0.0100000 | 36.47614 | 0.0229249 | 0.0379147 | 0.6858607 |
| NHI | IRT | 0.0100000 | 31.43695 | 0.0221344 | 0.0513428 | 0.6834500 |
| OGS | SR | 0.0100000 | 60.76330 | 0.0434783 | 0.0631912 | 0.6826635 |
| ZWS | WTS | 0.0196690 | 26.42704 | 0.0537549 | 0.0363349 | 0.6813396 |
| AUB | HWC | 0.0100000 | 45.68916 | 0.0434783 | 0.0221169 | 0.6793768 |
| FELE | ZWS | 0.0167806 | 28.76564 | 0.0395257 | 0.0481833 | 0.6783621 |
| OUT | NHI | 0.0100000 | 32.49217 | 0.0308300 | 0.0244866 | 0.6781651 |
| IBOC | ENVA | 0.0194988 | 33.46433 | 0.0624506 | 0.0560821 | 0.6780390 |
| CATY | ASB | 0.0100000 | 38.17759 | 0.0276680 | 0.0134281 | 0.6720211 |
| NHI | SBRA | 0.0100000 | 47.80517 | 0.0387352 | 0.0505529 | 0.6702805 |
| CATY | AUB | 0.0206965 | 36.99237 | 0.0671937 | 0.0434439 | 0.6645615 |
| BKU | WSFS | 0.0310750 | 32.26862 | 0.0664032 | 0.0868878 | 0.6531103 |
| WSFS | AUB | 0.0100000 | 42.87030 | 0.0245059 | 0.0221169 | 0.6510033 |
| BGC | CATY | 0.0100000 | 50.73482 | 0.0260870 | 0.0315956 | 0.6276560 |
| RNST | VLY | 0.0109103 | 62.44461 | 0.0308300 | 0.0489731 | 0.6175357 |
| RNST | UMBF | 0.0100000 | 52.57616 | 0.0245059 | 0.0276461 | 0.6158717 |
| INDB | ONB | 0.0262279 | 43.31820 | 0.0561265 | 0.0710900 | 0.6074868 |
| UCB | HWC | 0.0100000 | 60.57633 | 0.0245059 | 0.0292259 | 0.6050154 |
| IBOC | HWC | 0.0100000 | 67.32100 | 0.0229249 | 0.0450237 | 0.5861110 |
| IBOC | UMBF | 0.0100000 | 59.51834 | 0.0284585 | 0.0071090 | 0.5764147 |
| AX | UMBF | 0.0162909 | 49.75532 | 0.0521739 | 0.0197472 | 0.5743123 |
| UCB | ABCB | 0.0100000 | 71.37739 | 0.0371542 | 0.0094787 | 0.5740845 |
| IBOC | ASB | 0.0100000 | 87.77295 | 0.0371542 | 0.0505529 | 0.5669575 |
| OGS | POR | 0.0244528 | 40.17737 | 0.0403162 | 0.0402844 | 0.5587288 |
| WSFS | IBOC | 0.0147520 | 78.78158 | 0.0292490 | 0.0837283 | 0.5516586 |
| MWA | ENS | 0.0218357 | 35.78825 | 0.0434783 | 0.0371248 | 0.5487730 |
| IBOC | ABCB | 0.0100000 | 73.11269 | 0.0260870 | 0.0300158 | 0.5453746 |
| IBOC | FULT | 0.0100000 | 87.65794 | 0.0276680 | 0.0442338 | 0.5427783 |
| FLG | FFIN | 0.0385546 | 28.28036 | 0.0758893 | 0.0497630 | 0.5340379 |
| WSFS | UCB | 0.0170780 | 60.23634 | 0.0324111 | 0.0355450 | 0.5330286 |
| IBOC | TCBI | 0.0100000 | 71.28163 | 0.0229249 | 0.0284360 | 0.5276707 |
| AUB | UMBF | 0.0296066 | 41.92046 | 0.0403162 | 0.0466035 | 0.5273374 |
| IBOC | SFBS | 0.0204414 | 45.20563 | 0.0276680 | 0.0323855 | 0.5230209 |
| BGC | ASB | 0.0329680 | 42.67259 | 0.0498024 | 0.0450237 | 0.5202895 |
| ASB | TCBI | 0.0431299 | 29.13923 | 0.0474308 | 0.0292259 | 0.5191872 |
| HWC | UMBF | 0.0400135 | 39.94392 | 0.0434783 | 0.0671406 | 0.5176086 |
| CATY | TCBI | 0.0247289 | 40.61629 | 0.0276680 | 0.0331754 | 0.5144050 |
| CDP | IRT | 0.0240036 | 46.32832 | 0.0584980 | 0.0513428 | 0.5075988 |
| INDB | GBCI | 0.0469137 | 38.60072 | 0.0521739 | 0.0529226 | 0.5005500 |
| BGC | HWC | 0.0232832 | 61.60283 | 0.0387352 | 0.0363349 | 0.4898185 |
| UCB | TCBI | 0.0175491 | 78.64111 | 0.0371542 | 0.0371248 | 0.4773141 |
| UCB | AX | 0.0372862 | 40.28011 | 0.0395257 | 0.0387046 | 0.4697040 |
| OTTR | NJR | 0.0242418 | 70.84344 | 0.0371542 | 0.0663507 | 0.4658995 |
| BGC | ABCB | 0.0380811 | 53.19447 | 0.0513834 | 0.0410742 | 0.4620384 |
| TCBI | VLY | 0.0304324 | 53.55636 | 0.0308300 | 0.0513428 | 0.4618092 |
| SR | POR | 0.0400616 | 32.64992 | 0.0284585 | 0.0229068 | 0.4617824 |
| FELE | UFPI | 0.0440843 | 45.46943 | 0.0466403 | 0.0466035 | 0.4481871 |
| BGC | VLY | 0.0229153 | 68.43908 | 0.0276680 | 0.0418641 | 0.4424950 |
| ACA | WTS | 0.0852277 | 22.26249 | 0.0637523 | 0.0163636 | 0.4361188 |
| CATY | VLY | 0.0766411 | 34.73599 | 0.0561265 | 0.0505529 | 0.4357837 |
| ASB | ABCB | 0.0920708 | 38.04586 | 0.0679842 | 0.0639810 | 0.4330473 |
| SLG | SKT | 0.0637475 | 34.25543 | 0.0347826 | 0.0616114 | 0.4317217 |
| SBRA | RHP | 0.0473032 | 41.55675 | 0.0505929 | 0.0537125 | 0.4292167 |
| FIBK | AUB | 0.0303268 | 72.34237 | 0.0308300 | 0.0560821 | 0.4283130 |
| AUB | AX | 0.0410034 | 39.85122 | 0.0363636 | 0.0213270 | 0.4249584 |
| FULT | ASB | 0.0540782 | 45.50663 | 0.0474308 | 0.0355450 | 0.4146750 |
| CDP | OUT | 0.0411972 | 42.84733 | 0.0466403 | 0.0497630 | 0.4145982 |
| WSFS | AX | 0.0353458 | 67.15111 | 0.0505929 | 0.0410742 | 0.4128501 |
| CDP | CTRE | 0.0280015 | 49.94401 | 0.0379447 | 0.0347551 | 0.4127500 |
| ONB | UMBF | 0.0189023 | 92.54043 | 0.0347826 | 0.0063191 | 0.4086798 |
| CBU | FFIN | 0.0763476 | 41.30273 | 0.0521739 | 0.0442338 | 0.3954279 |
| RNST | AX | 0.0273321 | 90.24370 | 0.0434783 | 0.0481833 | 0.3890620 |
| IBOC | PIPR | 0.0496822 | 48.67566 | 0.0363636 | 0.0481833 | 0.3866275 |
| RDN | VLY | 0.0281881 | 60.20865 | 0.0292490 | 0.0205371 | 0.3828196 |
| OGS | TXNM | 0.0999563 | 46.38068 | 0.0695652 | 0.0592417 | 0.3811694 |
| CBU | FBP | 0.0558204 | 51.24446 | 0.0403162 | 0.0458136 | 0.3792402 |
| AEO | URBN | 0.0648587 | 53.96487 | 0.0521739 | 0.0576619 | 0.3764495 |
| CATY | ABCB | 0.0610190 | 37.92438 | 0.0324111 | 0.0229068 | 0.3755930 |
| KBH | IBP | 0.0581278 | 45.07792 | 0.0426877 | 0.0189573 | 0.3727366 |
| SWX | POR | 0.0639836 | 45.26568 | 0.0411067 | 0.0592417 | 0.3692214 |
| FULT | TCBI | 0.0705543 | 39.90537 | 0.0379447 | 0.0418641 | 0.3629303 |
| IBOC | GBCI | 0.0400213 | 64.97056 | 0.0205534 | 0.0481833 | 0.3524255 |
| PIPR | ONB | 0.0616727 | 61.45374 | 0.0466403 | 0.0545024 | 0.3457987 |
| MWA | NPO | 0.0514109 | 38.41018 | 0.0371542 | 0.0221169 | 0.3455351 |
| AUB | ONB | 0.0964546 | 50.09245 | 0.0490119 | 0.0703002 | 0.3433461 |
| GPI | ABG | 0.0944825 | 42.46309 | 0.0553360 | 0.0410742 | 0.3422301 |
| TCBI | HWC | 0.0534527 | 53.65439 | 0.0213439 | 0.0355450 | 0.3416343 |
| FIBK | TCBI | 0.0220729 | 98.43102 | 0.0213439 | 0.0355450 | 0.3301580 |
| OUT | RHP | 0.0807783 | 40.45557 | 0.0513834 | 0.0363349 | 0.3278605 |
| ASB | UMBF | 0.0758241 | 71.50611 | 0.0482213 | 0.0703002 | 0.3213590 |
| FULT | VLY | 0.0699648 | 57.73031 | 0.0316206 | 0.0497630 | 0.3181433 |
| NWE | SR | 0.0930485 | 41.13884 | 0.0363636 | 0.0458136 | 0.3154110 |
| HOMB | AX | 0.0302993 | 68.24230 | 0.0205534 | 0.0276461 | 0.3006546 |
| MHO | PATK | 0.0552816 | 75.12931 | 0.0395257 | 0.0505529 | 0.2887243 |
| BKU | RNST | 0.0984566 | 40.08376 | 0.0284585 | 0.0331754 | 0.2811918 |
| OTTR | SWX | 0.0476525 | 106.34067 | 0.0395257 | 0.0521327 | 0.2803033 |
| CDP | NHI | 0.0887642 | 57.26630 | 0.0521739 | 0.0600316 | 0.2682673 |
| SR | BKH | 0.0773060 | 54.58750 | 0.0260870 | 0.0339652 | 0.2551352 |
| BKU | AX | 0.0942112 | 62.20448 | 0.0411067 | 0.0624013 | 0.2391245 |
| FLG | UBSI | 0.0924003 | 57.63963 | 0.0577075 | 0.0402844 | 0.2344719 |
| AX | VLY | 0.0965462 | 59.86371 | 0.0371542 | 0.0126382 | 0.1976889 |
| AVA | OTTR | 0.0772125 | 62.24236 | 0.0268775 | 0.0086888 | 0.1809389 |
| SBRA | CTRE | 0.0909055 | 57.54684 | 0.0268775 | 0.0347551 | 0.1585796 |
| RDN | ABCB | 0.0730927 | 87.20046 | 0.0292490 | 0.0134281 | 0.1224459 |
| RDN | ASB | 0.0696406 | 91.38151 | 0.0276680 | 0.0134281 | 0.1182564 |
Equity Pairs Selection in the Russell 2000
A Stability-Focused, Industry-Constrained Statistical Arbitrage Pipeline
1 Motivation and Objective
Pairs trading strategies rely on identifying asset pairs whose relative price dynamics are stable and mean-reverting. In practice, many statistically appealing pairs fail out-of-sample due to regime shifts, structural breaks, or spurious correlations.
The objective of this analysis is to construct a robust pair selection pipeline for U.S. equities that:
- Emphasizes economic coherence via industry constraints
- Separates candidate generation from statistical validation
- Explicitly evaluates out-of-sample stability
- Prioritizes tradability, not just statistical significance
The focus is deliberately on pair selection, rather than signal generation, execution, or portfolio construction.
2 Data and Universe Construction
2.1 Equity Universe
The starting universe consists of equities from the Russell 2000, representing small-capitalization U.S. stocks.
To improve robustness and realism, the universe is filtered to remove:
- Securities with insufficient price history
- Illiquid or irregularly traded names
- Symbols with missing or inconsistent adjusted prices
All prices are transformed to log adjusted close prices to stabilize variance and allow linear modeling.
2.2 Industry Classification
Each security is assigned to an industry classification (e.g., GICS). All pair construction and evaluation is performed within industry groups only. This constraint:
- Eliminates spurious cross-industry relationships
- Enforces economic interpretability
- Aligns with how relative-value equity strategies are typically deployed
3 Train/test Split
To explicitly evaluate stability, the sample is split into:
- Training period: used for estimation and selection
- Test period (~5 years): used only for validation
All model estimation (hedge ratios, cointegration tests, half-life estimation) is performed only on the training window unless explicitly stated otherwise. No information from the test window is used during candidate generation.
4 Candidate Pair Generation
4.1 Return Correlation as Pre-Screen
Candidate pairs are generated using return correlations, computed over the training period. Return correlation is used only to reduce the combinatorial search space, not as evidence of mean reversion.
For each stock:
- Correlations are computed against peers within the same industry
- Top-N most correlated neighbors are retained
This produces a tractable, economically coherent candidate set while avoiding arbitrary clustering assumptions.
4.2 Refinement of Pairs
After de-duplication of symmetric pairs, the result is a long-form table of candidate pairs which serves as the baseline pair set to all subsequent statistical analysis.
5 Pair Diagnostics and Metrics
Each candidate pair is evaluated using a consistent set of diagnostics designed to capture long-run relationship strength, mean-reversion dynamics, and practical tradability. Metrics are computed primarily on the training window, with selected diagnostics recomputed out-of-sample for validation.
5.1 Hedge Ratio Estimation
For each candidate pair, a hedge ratio is estimated using ordinary least squares on log prices over the training window:
\[ \log P_x = \alpha + \beta \log P_y + \varepsilon_t \]
For each candidate pair, a hedge ratio is estimated using ordinary least squares on log prices over the training window:
\[ s_t = \log P_x - \left(\alpha + \beta \log P_y\right) \]
This spread represents the relative mispricing between the two securities under the assumed linear relationship.
In addition to the hedge ratio itself, several fit-quality diagnostics are retained, including the regression R^2 and residual volatility. These metrics help distinguish tightly linked pairs from relationships that are statistically significant but economically loose.
5.2 Cointegration Testing
Evidence of a stable long-run equilibrium relationship is assessed using an Augmented Dickey–Fuller (ADF) test applied to the training-period spread.
\[ \Delta s_t = \gamma s_{t-1} + \sum_{i=1}^{k} \phi_i \Delta s_{t-i} + \epsilon_t \]
\[ \gamma < 0 \]
The ADF test evaluates whether the spread is stationary, providing a necessary (but not sufficient) condition for mean reversion. For each pair, both the ADF test statistic and p-value are recorded.
Cointegration is treated as an initial screening signal, not a final decision rule. Many cointegrated pairs exhibit impractically slow or unstable dynamics and are filtered later through additional diagnostics.
5.3 Mean Reversion Speed
To quantify the speed of mean reversion, the training-period spread is approximated using an AR(1) process:
\[ s_t = \rho s_{t-1} + \epsilon_t \]
The implied half-life of mean reversion is computed as:
\[ \text{Half-life} = -\frac{\log 2}{\log \rho} \]
Half-life provides a direct, interpretable measure of how quickly deviations from equilibrium decay. Pairs with extremely long half-lives are penalized, as they imply slow convergence and extended holding periods, while extremely short half-lives are treated cautiously as potential noise-driven artifacts.
5.4 Tradability Proxies
Statistical validity alone is insufficient for practical deployment. Several additional diagnostics are included to proxy real trading behavior:
- Spread volatility (standard deviation and interquartile range)
- Zero-crossing frequency, measuring how often the spread crosses its mean
- Excursion frequency, defined as the proportion of time the spread deviates beyond ±2 standard deviations
These measures help identify spreads that are both active and stable, filtering out relationships that are statistically stationary but rarely generate actionable deviations.
6 Out-of-Sample Validation
To assess robustness, key diagnostics are recomputed on a held-out five-year test window, including:
- Cointegration p-values
- Mean-reversion half-life
Rather than enforcing strict pass/fail rules, out-of-sample behavior is incorporated as a stability signal. Pairs that retain cointegration and similar mean-reversion characteristics out-of-sample are rewarded, while pairs exhibiting large regime-dependent shifts are penalized.
This approach balances robustness with flexibility, avoiding excessive reliance on any single test while still discouraging overfit relationships.
7 Pair Ranking Framework
Candidate pairs are ranked using a composite scoring framework that combines:
- Training-period cointegration strength
- Out-of-sample cointegration confirmation
- Mean-reversion speed
- Stability of dynamics across regimes
- Tradability proxies such as zero-crossings and spread excursions
Each component is normalized within the candidate set and combined using a weighted average. Hard thresholds are applied sparingly to remove clearly unsuitable pairs, while most decisions are handled through continuous scoring to avoid brittle cutoff effects.
The resulting ranked list emphasizes stability, interpretability, and practical relevance rather than in-sample optimization.
8 Example Pairs
8.1 NWE vs. POR
8.2 MHO vs. IBP
8.3 SLG vs. MAC
9 Summary and Next Steps
This analysis presents a disciplined approach to equity pair selection that prioritizes robustness over complexity. By combining industry constraints, lightweight pre-screening, rigorous statistical diagnostics, and explicit out-of-sample validation, the resulting pairs are better aligned with real-world statistical arbitrage constraints.
Future extensions could include rolling-window persistence metrics, hedge ratio stability diagnostics, or integration with execution-aware backtesting frameworks.
10 Limitations
This work focuses intentionally on pair selection rather than full strategy backtesting or deployment. Several important considerations are therefore out of scope:
Transaction costs and market impact: The analysis does not model bid–ask spreads, slippage, or borrow costs, all of which can materially affect realized performance for small-cap equities.
Execution and signal design: While spreads and z-scores are computed for diagnostics, the report does not define entry/exit rules, position sizing, or portfolio constraints.
Structural breaks: Cointegration and half-life are evaluated on fixed train/test windows, but real-world relationships may shift within regimes. A rolling persistence study would provide a stronger view of temporal stability.
Survivorship and data quality risks: Results depend on the completeness and correctness of the Russell 2000 membership and price data. Corporate actions, symbol changes, and missing observations can bias estimates if not carefully controlled.
Multiple hypothesis testing: Screening many candidate pairs increases the chance of false positives. Out-of-sample evaluation mitigates this risk but does not eliminate it.